Skip to content

perf: i64 ABI specialization for integer-pure numeric functions#501

Merged
cs01 merged 1 commit intomainfrom
feat/int-specialization
Apr 13, 2026
Merged

perf: i64 ABI specialization for integer-pure numeric functions#501
cs01 merged 1 commit intomainfrom
feat/int-specialization

Conversation

@cs01
Copy link
Copy Markdown
Owner

@cs01 cs01 commented Apr 13, 2026

Summary

Adds a whole-module analysis pass that detects integer-pure numeric functions — functions where every parameter is integer-valued, every return value is integer-valued, every intermediate expression is integer-preserving, and the function is never used as a first-class value — and specializes them to an i64 ABI instead of the default double ABI.

The canonical motivating case is naive-recursive fib:

function fib(n: number): number {
  if (n <= 1) return n;
  return fib(n - 1) + fib(n - 2);
}

Before this PR, chad correctly narrowed the body operations to sub i64 / icmp sle i64 (the existing integer-analysis pass), but the function signature stayed define double @_cs_fib(double %arg0). Every recursive call paid fptosi double→i64 on entry + sitofp i64→double on arg prep, and the combine was fadd double %a, %b (3-cycle latency) instead of add i64 %a, %b (1-cycle). The whole recursion chain serialized on the fadd latency wall.

After this PR, fib compiles to:

define i64 @_cs_fib(i64 %arg0) {
entry:
  %0 = alloca i64
  store i64 %arg0, i64* %0
  %2 = load i64, i64* %0
  %4 = icmp sle i64 %2, 1
  ...
  %9 = sub i64 %7, 1
  %11 = call i64 @_cs_fib(i64 %9)
  %14 = sub i64 %12, 2
  %16 = call i64 @_cs_fib(i64 %14)
  %17 = add i64 %11, %16
  ret i64 %17
}

No float ops, no round-trip conversions, 1-cycle integer arithmetic the whole way down.

How it works

New file src/codegen/infrastructure/int-specialization-detector.ts (~540 lines) runs markIntSpecializedFunctions(ast) before codegen. A function is eligible iff:

  1. All parameters are declared number (or untyped — chad defaults numeric params to number).
  2. Return type is number or unspecified.
  3. No optional params, no defaults, no async, no declare.
  4. Body has no try / throw / await / for…of / switch (keeps the analysis tractable and avoids edge cases where value type could change inside a handler).
  5. Body calls no other function except itself recursively. No method calls. No calls to stdlib. This is the conservative "leaf or self-recursive" restriction — a later PR can widen it to "calls other intSpecialized functions too."
  6. Every param proves integer-valued via the existing findI64EligibleVariables analysis (started from an integer literal, mutated only through ++/-- or integer-preserving binary ops).
  7. Every return statement produces an integer-shaped expression — integer literal, integer-eligible local read, or a binary op of integer-shaped operands where the op is +/-/*/%/bitwise. Division (/) is explicitly excluded because TS number division can produce non-integers.
  8. The function name does not appear as a first-class value anywhere in the program (see "Escape analysis" below).

When all 8 conditions hold, func.intSpecialized = true is set, and the function-generator at src/codegen/infrastructure/function-generator.ts picks that up to emit the i64 ABI: every double param becomes i64, the return type becomes i64, and the entry-block fptosi %arg → i64 on numeric params is skipped (since the arg already arrives as i64).

Call-site lowering was updated at src/codegen/expressions/calls.ts so that when the callee is intSpecialized, numeric args are passed as i64 directly rather than fptosi'd at the call site. This closes the loop — calls to intSpecialized functions use the i64 ABI end-to-end.

Escape analysis (the critical correctness guard)

The first draft of this pass (without escape analysis) silently miscompiled programs like:

function add(acc: number, x: number): number { return acc + x; }
const sum = [1, 2, 3, 4, 5].reduce(add, 0);  // → returned garbage

add looks eligible by every local criterion (pure integer, no foreign calls, integer-shaped return), so the detector marked it specialized. But reduce(add, 0) passes add as a callback — the runtime reduce stored it as a function pointer with the canonical double(double, double) signature and called it with double arguments. Those double bits got reinterpreted as i64 when the specialized add body read them, producing garbage.

The fix: a whole-AST walker (collectEscapedFunctionNames) that scans every expression position in every function body, every class method body, every top-level statement, and every top-level expression, collecting all VariableNode names. Any top-level function whose name appears in that set is "escaped" and gets excluded from specialization. The walker also treats MethodCallNode.method as an escape reference — chad's method-call lowering falls back to calling a top-level function when the receiver has no matching class method, so obj.add(5, 7) (where obj is an object literal and add is a top-level function) goes through the canonical double ABI and must not be specialized.

The walker handles every expression type in chad's AST: variable, call, method_call, new, binary, unary, member_access, index_access, array, object, map, set, template_literal, conditional, await, member_access_assignment, index_access_assignment, type_assertion, spread_element, arrow_function (both expression-body and block-body shapes). Statement walker handles variable_declaration, assignment, return, if, while, do_while, for, for_of, throw, try, switch, block, and expression-as-statement.

The analysis is conservative — a local variable named the same as a top-level function triggers a false positive (the function won't be specialized even though the local shadows it). That's fine — specialization is an optimization, losing it on a corner case is strictly safer than miscompiling.

Correctness verification

Repro for the original callback bug (would miscompile pre-escape-analysis):

function add(acc: number, x: number): number { return acc + x; }
const nums: number[] = [1, 2, 3, 4, 5];
const sum = nums.reduce(add, 0);
console.log(sum);  // Must print 15

Confirmed: prints 15 after this PR (ran the existing tests/fixtures/arrays/array-reduce.ts fixture which exercises this exact shape through TEST_PASSED).

Repro for the method-call dispatch bug (fix/object-method in commit):

function add(a, b) { return a + b; }
function testMethod() {
  const obj = { add: 0 };
  return obj.add(5, 7);  // chad falls back to top-level `add`
}
process.exit(testMethod());  // Must exit 12

Confirmed: exits 12 after this PR.

Generated IR for fib:

define i64 @_cs_fib(i64 %arg0) { ... }   ; was: define double @_cs_fib(double %arg0)

Entry no longer has an fptosi on the param. Combine is add i64, not fadd double.

Generated IR for add when used as callback:

define double @_cs_add(double %arg0, double %arg1) { ... }   ; unchanged, escape caught

Measurements

Apple Silicon M-series, macOS ARM64, best of 3, chad built from this branch (rebased onto current origin/main which includes #499).

bench baseline this PR delta
fibonacci 792 ms 509 ms -36%, 1.56× faster
matmul 112 ms 111 ms tie (within noise)
sieve 12 ms 12 ms tie
sorting 140 ms 140 ms tie
montecarlo 266 ms 265 ms tie
nbody 827 ms 827 ms tie
binarytrees 620 ms 620 ms tie

Only fibonacci moves because it's the only benchmark with an integer-pure recursive numeric function that meets all 8 eligibility criteria. All other benchmarks are unchanged, including ones with integer-heavy code (sieve, sorting, montecarlo) — their hot-path functions either escape (passed as callbacks), call stdlib methods, or have non-integer intermediate expressions, so the detector correctly leaves them on the double ABI.

Architecture-independence: the fix replaces fadd double (2-3 cycle latency on both arm64 and x86-64) with add i64 (1 cycle on both), plus removes the per-call fptosi/sitofp conversions. There's no NEON/AVX2 dependence, so the same ~35-50% improvement should land on Linux x86-64 CI as well — I'll be watching the auto-posted benchmark comment to confirm.

Comparison against C and Go on Apple Silicon

After both this PR and #499 (float-literal narrowing), chad's numbers against C (clang -O2 -march=native) and Go (1.26, native):

bench C (ms) chad (ms) Go (ms) chad vs C chad vs Go
binarytrees 848 620 814 0.73 24% faster
fibonacci 433 509 579 1.17 12% faster
montecarlo 266 265 256 1.00 tied
nbody 777 827 788 1.06 tied (+5%)
matmul 102 111 103 1.09 tied (+8%)
sorting 122 140 125 1.15 tied (+12%)
sieve 8 12 11 1.51 tied

Chad now beats Go on fibonacci and binarytrees, ties C on montecarlo, and is within 10% of C on matmul, nbody, montecarlo. The fib win over Go is what this PR unlocks.

Tests

  • npm test774/774 pass, 0 failures, 37 suites, 90.9s duration.
  • Updated one assertion in tests/compiler.test.ts:152 that was checking for define double @_cs_add to also accept define i64 @_cs_add — the test was written pre-specialization and the fixture simple-add.js (function add(a, b) { return a + b; } + process.exit(add(5, 7))) is exactly the shape that gets specialized. The test author had already defensively handled the i64 case for the add instruction assertion later in the same test, just missed this one.

No other test changes. Every fixture that produces TEST_PASSED or a specific exit code still does.

Scope / follow-ups

Not in this PR:

  • Specialization across modules (requires cross-module signature propagation).
  • Functions that call other intSpecialized functions (currently rejected as "foreign call"). This would widen coverage meaningfully once added.
  • Class methods (only top-level functions are detected).
  • Functions with for…of / switch / try (rejected by the body-shape gate). Most of these are real limitations that can be relaxed later with more careful analysis.
  • Mixed-mode specialization: a function that sometimes returns an integer and sometimes a float (currently rejected).

These are all strict widenings of the current pass that won't affect already-specialized functions, and can be done in follow-up PRs driven by specific use cases.

Risk: the primary risk is undetected escapes. The walker handles all 21 expression types and 14 statement types in the AST; if a new AST node is added and the walker isn't updated, the new node's children won't be scanned for escapes and a function could be wrongly specialized. Mitigation: the walker's fallback for expression-as-statement dispatches to collectEscapedVarRefsExpr, so new expression types will at least get the full expression walker. New statement types need to be added to collectEscapedVarRefsStmts explicitly.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Results (Linux x86-64)

Benchmark C ChadScript Go Node Bun Place
Binary Trees 1.575s 1.267s 2.763s 1.189s 0.969s 🥉
Cold Start 1.0ms 0.8ms 1.2ms 28.7ms 9.8ms 🥇
Fibonacci 0.815s 0.815s 1.562s 3.203s 2.018s 🥇
File I/O 0.118s 0.092s 0.084s 0.199s 0.181s 🥈
JSON Parse/Stringify 0.004s 0.005s 0.018s 0.015s 0.007s 🥈
Matrix Multiply 0.449s 0.999s 0.638s 0.379s 0.335s #5
Monte Carlo Pi 0.389s 0.410s 0.405s 2.248s 6.068s 🥉
N-Body Simulation 1.668s 2.126s 2.206s 2.391s 3.265s 🥈
Quicksort 0.215s 0.245s 0.213s 0.262s 0.228s #4
SQLite 0.348s 0.400s 0.444s 0.400s 🥈
Sieve of Eratosthenes 0.016s 0.029s 0.018s 0.040s 0.037s 🥉
String Manipulation 0.008s 0.046s 0.016s 0.035s 0.028s #5

CLI Tool Benchmarks

Benchmark ChadScript grep node xxd Place
Hex Dump 0.437s 0.995s 0.134s 🥈
Recursive Grep 0.019s 0.010s 0.097s 🥈

@cs01 cs01 merged commit eb1b25f into main Apr 13, 2026
13 checks passed
@cs01 cs01 deleted the feat/int-specialization branch April 13, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant